Overview

Dataset statistics

Number of variables13
Number of observations5061
Missing cells2
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory514.1 KiB
Average record size in memory104.0 B

Variable types

NUM12
DATE1

Reproduction

Analysis started2020-07-07 19:30:16.312975
Analysis finished2020-07-07 19:30:47.500818
Duration31.19 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

High is highly correlated with Open and 3 other fieldsHigh correlation
Open is highly correlated with High and 3 other fieldsHigh correlation
Low is highly correlated with Open and 3 other fieldsHigh correlation
Close is highly correlated with Open and 3 other fieldsHigh correlation
Turnover is highly correlated with VolumeHigh correlation
Volume is highly correlated with TurnoverHigh correlation
year is highly correlated with Open and 3 other fieldsHigh correlation
Date has unique values Unique

Variables

Date
Date

UNIQUE

Distinct count5061
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size39.5 KiB
Minimum2000-01-03 00:00:00
Maximum2020-05-08 00:00:00
Histogram

Open
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count4959
Unique (%)98.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5216.23797668445
Minimum853.0
Maximum12430.5
Zeros0
Zeros (%)0.0%
Memory size39.5 KiB

Quantile statistics

Minimum853
5-th percentile1054.95
Q11983.2
median5096.7
Q37895.4
95-th percentile11106.55
Maximum12430.5
Range11577.5
Interquartile range (IQR)5912.2

Descriptive statistics

Standard deviation3274.529979
Coefficient of variation (CV)0.6277570144
Kurtosis-0.9093646122
Mean5216.237977
Median Absolute Deviation (MAD)2998.45
Skewness0.4138808432
Sum26399380.4
Variance10722546.58
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1070.9530.1%
 
1036.3530.1%
 
2057.752< 0.1%
 
5688.452< 0.1%
 
1085.952< 0.1%
 
7636.052< 0.1%
 
962.852< 0.1%
 
5114.72< 0.1%
 
41662< 0.1%
 
972.052< 0.1%
 
Other values (4949)503999.6%
 
ValueCountFrequency (%) 
8531< 0.1%
 
861.351< 0.1%
 
869.151< 0.1%
 
872.151< 0.1%
 
873.151< 0.1%
 
ValueCountFrequency (%) 
12430.51< 0.1%
 
12349.41< 0.1%
 
12347.11< 0.1%
 
12333.11< 0.1%
 
12328.41< 0.1%
 

High
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count4975
Unique (%)98.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5251.466666666666
Minimum877.0
Maximum12430.5
Zeros0
Zeros (%)0.0%
Memory size39.5 KiB

Quantile statistics

Minimum877
5-th percentile1063.7
Q11999.7
median5135.55
Q37929.1
95-th percentile11146.9
Maximum12430.5
Range11553.5
Interquartile range (IQR)5929.4

Descriptive statistics

Standard deviation3283.745772
Coefficient of variation (CV)0.625300698
Kurtosis-0.9116963277
Mean5251.466667
Median Absolute Deviation (MAD)3009
Skewness0.407765243
Sum26577672.8
Variance10782986.3
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
994.1530.1%
 
10722< 0.1%
 
1837.952< 0.1%
 
964.22< 0.1%
 
5924.62< 0.1%
 
8229.42< 0.1%
 
1574.12< 0.1%
 
1070.152< 0.1%
 
1098.62< 0.1%
 
4434.452< 0.1%
 
Other values (4965)504099.6%
 
ValueCountFrequency (%) 
8771< 0.1%
 
878.61< 0.1%
 
893.051< 0.1%
 
893.351< 0.1%
 
903.751< 0.1%
 
ValueCountFrequency (%) 
12430.51< 0.1%
 
12389.051< 0.1%
 
12385.451< 0.1%
 
12374.251< 0.1%
 
12355.151< 0.1%
 

Low
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count4956
Unique (%)97.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5174.455008891523
Minimum849.95
Maximum12321.4
Zeros0
Zeros (%)0.0%
Memory size39.5 KiB

Quantile statistics

Minimum849.95
5-th percentile1045.55
Q11964.65
median5038.85
Q37837.7
95-th percentile11037.85
Maximum12321.4
Range11471.45
Interquartile range (IQR)5873.05

Descriptive statistics

Standard deviation3257.303609
Coefficient of variation (CV)0.6294969428
Kurtosis-0.9087343908
Mean5174.455009
Median Absolute Deviation (MAD)2972.3
Skewness0.4181549213
Sum26187916.8
Variance10610026.8
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1043.330.1%
 
1034.130.1%
 
4143.252< 0.1%
 
2611.952< 0.1%
 
5975.552< 0.1%
 
1096.252< 0.1%
 
1071.352< 0.1%
 
1448.952< 0.1%
 
5910.952< 0.1%
 
1355.352< 0.1%
 
Other values (4946)503999.6%
 
ValueCountFrequency (%) 
849.951< 0.1%
 
8531< 0.1%
 
858.851< 0.1%
 
859.21< 0.1%
 
861.051< 0.1%
 
ValueCountFrequency (%) 
12321.41< 0.1%
 
12315.81< 0.1%
 
12308.71< 0.1%
 
12285.81< 0.1%
 
12278.751< 0.1%
 

Close
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count4947
Unique (%)97.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5213.433768030034
Minimum854.2
Maximum12362.3
Zeros0
Zeros (%)0.0%
Memory size39.5 KiB

Quantile statistics

Minimum854.2
5-th percentile1054.8
Q11982.75
median5090.85
Q37887.8
95-th percentile11105.35
Maximum12362.3
Range11508.1
Interquartile range (IQR)5905.05

Descriptive statistics

Standard deviation3270.073955
Coefficient of variation (CV)0.6272399536
Kurtosis-0.9106763981
Mean5213.433768
Median Absolute Deviation (MAD)2995.95
Skewness0.4122485421
Sum26385188.3
Variance10693383.67
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
106730.1%
 
954.7530.1%
 
5274.8530.1%
 
5486.1530.1%
 
1070.6530.1%
 
1110.4530.1%
 
963.252< 0.1%
 
4359.32< 0.1%
 
1286.752< 0.1%
 
1358.052< 0.1%
 
Other values (4937)503599.5%
 
ValueCountFrequency (%) 
854.21< 0.1%
 
861.41< 0.1%
 
869.051< 0.1%
 
872.251< 0.1%
 
873.71< 0.1%
 
ValueCountFrequency (%) 
12362.31< 0.1%
 
12355.51< 0.1%
 
12352.351< 0.1%
 
12343.31< 0.1%
 
12329.551< 0.1%
 

Volume
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count5059
Unique (%)> 99.9%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean166441646.00790513
Minimum1394931.0
Maximum1811564187.0
Zeros0
Zeros (%)0.0%
Memory size39.5 KiB

Quantile statistics

Minimum1394931
5-th percentile38776031.4
Q177723025.75
median137086023
Q3198302099
95-th percentile415238071.3
Maximum1811564187
Range1810169256
Interquartile range (IQR)120579073.2

Descriptive statistics

Standard deviation141956679.2
Coefficient of variation (CV)0.8528915845
Kurtosis18.65450862
Mean166441646
Median Absolute Deviation (MAD)60112828.5
Skewness3.298774292
Sum8.421947288e+11
Variance2.015169877e+16
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2231354592< 0.1%
 
1222786981< 0.1%
 
5256747151< 0.1%
 
457867331< 0.1%
 
1495925781< 0.1%
 
638044371< 0.1%
 
1074812971< 0.1%
 
447384471< 0.1%
 
2313827901< 0.1%
 
1251287361< 0.1%
 
Other values (5049)504999.8%
 
ValueCountFrequency (%) 
13949311< 0.1%
 
27682921< 0.1%
 
65557031< 0.1%
 
79911651< 0.1%
 
97743921< 0.1%
 
ValueCountFrequency (%) 
18115641871< 0.1%
 
15661190571< 0.1%
 
15172757611< 0.1%
 
14148372501< 0.1%
 
13890617751< 0.1%
 

Turnover
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count5054
Unique (%)99.9%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean68261867766.798416
Minimum401200000.0
Maximum597055300000.0
Zeros0
Zeros (%)0.0%
Memory size39.5 KiB

Quantile statistics

Minimum401200000
5-th percentile1.1863665e+10
Q13.0176425e+10
median5.79317e+10
Q38.4935175e+10
95-th percentile1.8085872e+11
Maximum5.970553e+11
Range5.966541e+11
Interquartile range (IQR)5.475875e+10

Descriptive statistics

Standard deviation5.482144334e+10
Coefficient of variation (CV)0.8031049418
Kurtosis8.215151231
Mean6.826186777e+10
Median Absolute Deviation (MAD)2.743195e+10
Skewness2.203935298
Sum3.454050509e+14
Variance3.00539065e+21
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.096895e+112< 0.1%
 
99293000002< 0.1%
 
1.38814e+102< 0.1%
 
2.50291e+102< 0.1%
 
2.82538e+102< 0.1%
 
5.25239e+102< 0.1%
 
9.59185e+101< 0.1%
 
1.113904e+111< 0.1%
 
2.74691e+101< 0.1%
 
8.19595e+101< 0.1%
 
Other values (5044)504499.7%
 
ValueCountFrequency (%) 
4012000001< 0.1%
 
11399000001< 0.1%
 
28960000001< 0.1%
 
29789000001< 0.1%
 
35574000001< 0.1%
 
ValueCountFrequency (%) 
5.970553e+111< 0.1%
 
5.408153e+111< 0.1%
 
4.603062e+111< 0.1%
 
4.416789e+111< 0.1%
 
4.187864e+111< 0.1%
 

P/E
Real number (ℝ≥0)

Distinct count1576
Unique (%)31.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.03701442402687
Minimum10.68
Maximum29.9
Zeros0
Zeros (%)0.0%
Memory size39.5 KiB

Quantile statistics

Minimum10.68
5-th percentile13.51
Q117.11
median20.09
Q322.86
95-th percentile27.32
Maximum29.9
Range19.22
Interquartile range (IQR)5.75

Descriptive statistics

Standard deviation4.194021036
Coefficient of variation (CV)0.2093136705
Kurtosis-0.6892830683
Mean20.03701442
Median Absolute Deviation (MAD)2.89
Skewness0.1375321807
Sum101407.33
Variance17.58981245
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20.82140.3%
 
20.58120.2%
 
18.38110.2%
 
21.19110.2%
 
18.66110.2%
 
20.31100.2%
 
18.28100.2%
 
20.34100.2%
 
17.85100.2%
 
21.59100.2%
 
Other values (1566)495297.8%
 
ValueCountFrequency (%) 
10.681< 0.1%
 
10.841< 0.1%
 
10.862< 0.1%
 
10.91< 0.1%
 
10.931< 0.1%
 
ValueCountFrequency (%) 
29.91< 0.1%
 
29.731< 0.1%
 
29.691< 0.1%
 
29.681< 0.1%
 
29.651< 0.1%
 

P/B
Real number (ℝ≥0)

Distinct count418
Unique (%)8.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.54745900019759
Minimum1.92
Maximum6.55
Zeros0
Zeros (%)0.0%
Memory size39.5 KiB

Quantile statistics

Minimum1.92
5-th percentile2.36
Q13.03
median3.47
Q33.79
95-th percentile5.21
Maximum6.55
Range4.63
Interquartile range (IQR)0.76

Descriptive statistics

Standard deviation0.7992038433
Coefficient of variation (CV)0.2252890994
Kurtosis1.055125527
Mean3.547459
Median Absolute Deviation (MAD)0.39
Skewness0.9052240412
Sum17953.69
Variance0.6387267832
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3.44551.1%
 
3.52521.0%
 
3.57491.0%
 
3.45491.0%
 
3.48480.9%
 
3.49480.9%
 
3.58480.9%
 
3.5460.9%
 
3.4460.9%
 
3.7460.9%
 
Other values (408)457490.4%
 
ValueCountFrequency (%) 
1.921< 0.1%
 
1.941< 0.1%
 
1.961< 0.1%
 
1.972< 0.1%
 
2.011< 0.1%
 
ValueCountFrequency (%) 
6.551< 0.1%
 
6.541< 0.1%
 
6.532< 0.1%
 
6.471< 0.1%
 
6.461< 0.1%
 

Div Yield
Real number (ℝ≥0)

Distinct count236
Unique (%)4.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.428164394388461
Minimum0.59
Maximum3.18
Zeros0
Zeros (%)0.0%
Memory size39.5 KiB

Quantile statistics

Minimum0.59
5-th percentile0.95
Q11.18
median1.33
Q31.55
95-th percentile2.28
Maximum3.18
Range2.59
Interquartile range (IQR)0.37

Descriptive statistics

Standard deviation0.4023795937
Coefficient of variation (CV)0.2817459917
Kurtosis2.544773961
Mean1.428164394
Median Absolute Deviation (MAD)0.19
Skewness1.454176845
Sum7227.94
Variance0.1619093374
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.251553.1%
 
1.261182.3%
 
1.271112.2%
 
1.221092.2%
 
1.231052.1%
 
1.241002.0%
 
1.28841.7%
 
1.29821.6%
 
1.36791.6%
 
1.42781.5%
 
Other values (226)404079.8%
 
ValueCountFrequency (%) 
0.592< 0.1%
 
0.660.1%
 
0.6170.1%
 
0.6240.1%
 
0.632< 0.1%
 
ValueCountFrequency (%) 
3.181< 0.1%
 
3.172< 0.1%
 
3.161< 0.1%
 
3.152< 0.1%
 
3.131< 0.1%
 

year
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count21
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2009.6437462951985
Minimum2000
Maximum2020
Zeros0
Zeros (%)0.0%
Memory size39.5 KiB

Quantile statistics

Minimum2000
5-th percentile2001
Q12005
median2010
Q32015
95-th percentile2019
Maximum2020
Range20
Interquartile range (IQR)10

Descriptive statistics

Standard deviation5.877213225
Coefficient of variation (CV)0.00292450502
Kurtosis-1.199090787
Mean2009.643746
Median Absolute Deviation (MAD)5
Skewness0.01463027409
Sum10170807
Variance34.54163529
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20042545.0%
 
20032545.0%
 
20102525.0%
 
20122515.0%
 
20022515.0%
 
20052515.0%
 
20002504.9%
 
20132504.9%
 
20062504.9%
 
20072494.9%
 
Other values (11)254950.4%
 
ValueCountFrequency (%) 
20002504.9%
 
20012484.9%
 
20022515.0%
 
20032545.0%
 
20042545.0%
 
ValueCountFrequency (%) 
2020871.7%
 
20192454.8%
 
20182464.9%
 
20172484.9%
 
20162474.9%
 

month
Real number (ℝ≥0)

Distinct count12
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.441414740169927
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size39.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q39
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.452497506
Coefficient of variation (CV)0.535984352
Kurtosis-1.203710545
Mean6.44141474
Median Absolute Deviation (MAD)3
Skewness0.01801769596
Sum32600
Variance11.91973903
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
14488.9%
 
74398.7%
 
54358.6%
 
34338.6%
 
64278.4%
 
84218.3%
 
124198.3%
 
24158.2%
 
94098.1%
 
104088.1%
 
Other values (2)80715.9%
 
ValueCountFrequency (%) 
14488.9%
 
24158.2%
 
34338.6%
 
44038.0%
 
54358.6%
 
ValueCountFrequency (%) 
124198.3%
 
114048.0%
 
104088.1%
 
94098.1%
 
84218.3%
 

day
Real number (ℝ≥0)

Distinct count31
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.791345583876705
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Memory size39.5 KiB

Quantile statistics

Minimum1
5-th percentile2
Q18
median16
Q323
95-th percentile30
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.783717205
Coefficient of variation (CV)0.5562361458
Kurtosis-1.196499001
Mean15.79134558
Median Absolute Deviation (MAD)8
Skewness0.01089232119
Sum79920
Variance77.15368793
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
71733.4%
 
41723.4%
 
111723.4%
 
281723.4%
 
51723.4%
 
271723.4%
 
231713.4%
 
31713.4%
 
161703.4%
 
201703.4%
 
Other values (21)334666.1%
 
ValueCountFrequency (%) 
11533.0%
 
21482.9%
 
31713.4%
 
41723.4%
 
51723.4%
 
ValueCountFrequency (%) 
311022.0%
 
301553.1%
 
291563.1%
 
281723.4%
 
271723.4%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

DateOpenHighLowCloseVolumeTurnoverP/EP/BDiv Yieldyearmonthday
02000-01-031482.151592.901482.151592.2025358322.08.841500e+0925.914.630.95200013
12000-01-041594.401641.951594.401638.7038787872.01.973690e+1026.674.760.92200014
22000-01-051634.551635.501555.051595.8062153431.03.084790e+1025.974.640.95200015
32000-01-061595.801639.001595.801617.6051272875.02.531180e+1026.324.700.94200016
42000-01-071616.601628.251597.201613.3054315945.01.914630e+1026.254.690.94200017
52000-01-101615.651662.101614.951632.9545013949.02.375350e+1026.574.740.932000110
62000-01-111633.251639.901548.251572.5049120254.02.596950e+1025.594.570.962000111
72000-01-121572.301631.551571.701624.8038364961.01.895000e+1026.444.720.932000112
82000-01-131627.851671.151613.651621.4044738447.02.237610e+1026.384.710.932000113
92000-01-141622.151627.401591.401622.7543292009.01.979980e+1026.414.710.932000114

Last rows

DateOpenHighLowCloseVolumeTurnoverP/EP/BDiv Yieldyearmonthday
50512020-04-249163.909296.909141.309154.40659439249.03.285905e+1120.482.611.662020424
50522020-04-279259.709377.109250.359282.30512793298.02.669654e+1120.772.651.642020427
50532020-04-289389.809404.409260.009380.90614548983.03.009141e+1121.002.671.622020428
50542020-04-299408.609599.859392.359553.35653026950.03.167323e+1121.652.721.592020429
50552020-04-309753.509889.059731.509859.90931173802.03.933246e+1122.352.811.542020430
50562020-05-049533.509533.509266.959293.50NaNNaN21.392.651.64202054
50572020-05-059429.409450.909190.759205.60725196178.02.970020e+1121.192.621.65202055
50582020-05-069226.809346.909116.509270.90722185448.03.079810e+1121.342.641.64202056
50592020-05-079234.059277.859175.909199.05708740416.05.970553e+1121.182.621.65202057
50602020-05-089376.959382.659238.209251.50609053504.03.074345e+1121.282.641.64202058